Scalable ad-hoc entity extraction from text collections

نویسندگان

  • Sanjay Agrawal
  • Kaushik Chakrabarti
  • Surajit Chaudhuri
  • Venkatesh Ganti
چکیده

Supporting entity extraction from large document collections is important for enabling a variety of important data analysis tasks. In this paper, we introduce the “ad-hoc” entity extraction task where entities of interest are constrained to be from a list of entities that is specific to the task. In such scenarios, traditional entity extraction techniques that process all the documents for each ad-hoc entity extraction task can be significantly expensive. In this paper, we propose an efficient approach that leverages the inverted index on the documents to identify the subset of documents relevant to the task and processes only those documents. We demonstrate the efficiency of our techniques on real datasets.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Pipelines for Ad-hoc Large-scale Text Mining

Pipelines for Ad-hoc Large-scale Text Mining Today’s web search and big data analytics applications aim to address information needs (typically given in the form of search queries) ad-hoc on large numbers of texts. In order to directly return relevant information instead of only returning potentially relevant texts, these applications have begun to employ text mining. The term text mining cover...

متن کامل

Scalable Phrase Mining for Ad-hoc Text Analytics

Large text corpora with news, customer mail and reports, or Web 2.0 contributions offer a great potential for enhancing business-intelligence applications. We propose a framework for performing text analytics on such data in a versatile, efficient, and scalable manner. While much of the prior literature has emphasized mining keywords or tags in blogs or social-tagging communities, we emphasize ...

متن کامل

Design and evaluation of two scalable protocols for location management of mobile nodes in location based routing protocols in mobile Ad Hoc Networks

Heretofore several position-based routing protocols have been developed for mobile ad hoc networks. Many of these protocols assume that a location service is available which provides location information on the nodes in the network.Our solutions decrease location update without loss of query success rate or throughput and even increase those.Simulation results show that our methods are effectiv...

متن کامل

Design and evaluation of two scalable protocols for location management of mobile nodes in location based routing protocols in mobile Ad Hoc Networks

Heretofore several position-based routing protocols have been developed for mobile ad hoc networks. Many of these protocols assume that a location service is available which provides location information on the nodes in the network.Our solutions decrease location update without loss of query success rate or throughput and even increase those.Simulation results show that our methods are effectiv...

متن کامل

Interesting-Phrase Mining for Ad-Hoc Text Analytics

Large text corpora with news, customer mail and reports, or Web 2.0 contributions offer a great potential for enhancing business-intelligence applications. We propose a framework for performing text analytics on such data in a versatile, efficient, and scalable manner. While much of the prior literature has emphasized mining keywords or tags in blogs or social-tagging communities, we emphasize ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • PVLDB

دوره 1  شماره 

صفحات  -

تاریخ انتشار 2008